41 research outputs found

    Self- and Super-organizing Maps in R: The kohonen Package

    Get PDF
    In this age of ever-increasing data set sizes, especially in the natural sciences, visualisation becomes more and more important. Self-organizing maps have many features that make them attractive in this respect: they do not rely on distributional assumptions, can handle huge data sets with ease, and have shown their worth in a large number of applications. In this paper, we highlight the kohonen package for R, which implements self-organizing maps as well as some extensions for supervised pattern recognition and data fusion.

    Self- and Super-organizing Maps in R: The kohonen Package

    Get PDF
    In this age of ever-increasing data set sizes, especially in the natural sciences, visualisation becomes more and more important. Self-organizing maps have many features that make them attractive in this respect: they do not rely on distributional assumptions, can handle huge data sets with ease, and have shown their worth in a large number of applications. In this paper, we highlight the kohonen package for R, which implements self-organizing maps as well as some extensions for supervised pattern recognition and data fusion

    Spectral-spatial classification of hyperspectral images: three tricks and a new supervised learning setting

    Get PDF
    Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How is the performance of hyperspectral image classification methods affected when using disjoint train and test sets? We give a positive answer to the first question by using three tricks within a very basic shallow Convolutional Neural Network (CNN) architecture: a tailored loss function, and smooth- and label-based data augmentation. The tailored loss function enforces that neighborhood wavelengths have similar contributions to the features generated during training. A new label-based technique here proposed favors selection of pixels in smaller classes, which is beneficial in the presence of very few labeled pixels and skewed class distributions. To address the second question, we introduce a new sampling procedure to generate disjoint train and test set. Then the train set is used to obtain the CNN model, which is then applied to pixels in the test set to estimate their labels. We assess the efficacy of the simple neural network method on five publicly available hyperspectral images. On these images our method significantly outperforms considered baselines. Notably, with just 1% of labeled pixels per class, on these datasets our method achieves an accuracy that goes from 86.42% (challenging dataset) to 99.52% (easy dataset). Furthermore we show that the simple neural network method improves over other baselines in the new challenging supervised setting. Our analysis substantiates the highly beneficial effect of using the entire image (so train and test data) for constructing a model.Comment: Remote Sensing 201

    Chemometrics for ion mobility spectrometry data:Recent advances and future prospects

    Get PDF
    Contains fulltext : 161386.pdf (publisher's version ) (Open Access)Historically, advances in the field of ion mobility spectrometry have been hindered by the variation in measured signals between instruments developed by different research laboratories or manufacturers. This has triggered the development and application of chemometric techniques able to reveal and analyze precious information content of ion mobility spectra. Recent advances in multidimensional coupling of ion mobility spectrometry to chromatography and mass spectrometry has created new, unique challenges for data processing, yielding high-dimensional, megavariate datasets. In this paper, a complete overview of available chemometric techniques used in the analysis of ion mobility spectrometry data is given. We describe the current state-of-the-art of ion mobility spectrometry data analysis comprising datasets with different complexities and two different scopes of data analysis, i.e. targeted and non-targeted analyte analyses. Two main steps of data analysis are considered: data preprocessing and pattern recognition. A detailed description of recent advances in chemometric techniques is provided for these steps, together with a list of interesting applications. We demonstrate that chemometric techniques have a significant contribution to the recent and great expansion of ion mobility spectrometry technology into different application fields. We conclude that well-thought out, comprehensive data analysis strategies are currently emerging, including several chemometric techniques and addressing different data challenges. In our opinion, this trend will continue in the near future, stimulating developments in ion mobility spectrometry instrumentation even further

    Simultaneous analysis of plasma and CSF by NMR and hierarchical models fusion

    Get PDF
    Because cerebrospinal fluid (CSF) is the biofluid which interacts most closely with the central nervous system, it holds promise as a reporter of neurological disease, for example multiple sclerosis (MScl). To characterize the metabolomics profile of neuroinflammatory aspects of this disease we studied an animal model of MScl—experimental autoimmune/allergic encephalomyelitis (EAE). Because CSF also exchanges metabolites with blood via the blood–brain barrier, malfunctions occurring in the CNS may be reflected in the biochemical composition of blood plasma. The combination of blood plasma and CSF provides more complete information about the disease. Both biofluids can be studied by use of NMR spectroscopy. It is then necessary to perform combined analysis of the two different datasets. Mid-level data fusion was therefore applied to blood plasma and CSF datasets. First, relevant information was extracted from each biofluid dataset by use of linear support vector machine recursive feature elimination. The selected variables from each dataset were concatenated for joint analysis by partial least squares discriminant analysis (PLS-DA). The combined metabolomics information from plasma and CSF enables more efficient and reliable discrimination of the onset of EAE. Second, we introduced hierarchical models fusion, in which previously developed PLS-DA models are hierarchically combined. We show that this approach enables neuroinflamed rats (even on the day of onset) to be distinguished from either healthy or peripherally inflamed rats. Moreover, progression of EAE can be investigated because the model separates the onset and peak of the disease
    corecore